Operadores de Seleção por Similaridade para Sistemas de Gerenciamento de Bases de Dados Relacionais

نویسندگان

  • Adriano S. Arantes
  • Marcos R. Vieira
  • Caetano Traina
  • Agma J. M. Traina
چکیده

Searching operations in complex datasets are performed using comparison criteria based on similarity because equality comparison are barely useful and those based on the ordering relationships cannot be applied due to the nature of these datasets. There are two basic operators for similarity queries: Range Query and k-Nearest Neighbor Query. A great amount of research was done to achieve effective algorithms for those operators. However, algorithms that deal with these operators as parts of a more complex operation (compositions of them) were not developed yet. This article presents two new algorithms, named kAndRange and kOrRange, which are designed to answer conjunctions and disjunctions operations between those similarity criteria. The new algorithms were tested with sequential scan and with a metric access method called Slim-tree. The experimental results, performed with real and synthetic datasets, show that the new algorithms have better performance than the composition of the two operators to answer these complex similarity queries in all measured aspects, being up to 40 times faster. This is an essential point that will enable the practical use of similarity operators in Relational

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Tecnologia Objeto-Relacional em Ambientes de Data Warehouse: Uso de Séries de Tempo como Tipo de Dado Não Convencional

Este artigo discute a utilização da tecnologia objeto-relacional (OR) em ambientes de Data Warehouse (DW). Em especial, apresenta uma análise sobre a viabilidade do uso de séries temporais como tipo de dado não convencional em um DW. A dimensão tempo é fundamental em qualquer DW, uma vez que estes sistemas têm por objetivo armazenar dados históricos derivados de diversos sistemas heterogêneos, ...

متن کامل

Uma Abordagem para Armazenamento de Dados Semi-Estruturados em Bancos de Dados Relacionais

This paper presents an approach to storing semistructured data in relational databases. We focus on semistructured data as extracted from Web pages by a tool called DEByE (Data Extraction By Example), and organized according to its data model, the DEByE Object Model (DEByE-OM). The approach presented here consists in representing the structure of objects extracted by DEByE by a relational schem...

متن کامل

Uma Estratégia para Seleção de Atributos Relevantes no Processo de Resolução de Entidades

Data integration is an essential task for achieving a unified view of data stored in heterogeneous and distributed sources. A key step in this process is the Entity Resolution, which consists of identifying instances that refer to the same real-world entity. Functions that evaluate the similarity between values of attributes are used to identify equivalent instances. This work proposes a strate...

متن کامل

Ambiente de gerenciamento de imagens e dados espaciais para desenvolvimento de aplicações em biodiversidade

There is a wide range of environmental applications requiring sophisticated management of several kinds of data, including spatial data and images of living beings. However, available information systems offer very limited support for managing such data in an integrated manner. This thesis provides a solution to combine these query requirements, which takes advantage of current digital library ...

متن کامل

Estratégias de Seleção de Conteúdo com Base na CST (Cross-document Structure Theory) para Sumarização Automática Multidocumento

O presente trabalho apresenta a definição, formalização e avaliação de estratégias de seleção de conteúdo para sumarização automática multidocumento com base na teoria discursiva CST (Cross-document Structure Theory). A tarefa de seleção de conteúdo foi modelada por meio de operadores que representam possíveis preferências do usuário para a sumarização. Estes operadores são especificados em tem...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003